Which bins are mostly expressed in the two conditions? We will integrate this info with taxonomic assignment. Complete taxonomic assignments are available here.
The following heatmap shows expression levels (in TPMs) grouped by bin. You can hover each cell of the heatmap to get more info.
Use DESeq2 on salmon output.
Prefilter the dataset (for visualization purposes only). Raw count data, how many genes?
## [1] 116479
We can keep rows (= genes) with more than 1 count across all samples. How many genes do we have now?
## [1] 51991
We need to choose between transformation methods for running exploratory analyses.
Scatterplot of transformed counts from two samples.
Which samples are similar to each other, which are different? Does this fit to the expectation from the experiment’s design? We use the vst-transformed results to:
Heatmap of sample-to-sample distances using the variance stabilizing transformed values.
PCA analysis
Heatmap showing how much each gene deviates in a specific sample from the gene’s average (variance) across all samples. Please note: this is NOT a list of differentially expressed genes!!! Top 20 genes are shown. The complete table is shown here in interactive html format and here in csv format.
Details about the heatmap construction…
In the sample distance heatmap made previously, the dendrogram at the side shows us a hierarchical clustering of the samples. Such a clustering can also be performed for the genes. Since the clustering is only relevant for genes that actually carry a signal, one usually would only cluster a subset of the most highly variable genes. Here, for demonstration, let us select the 20 genes with the highest variance across samples. We will work with the VST data.
Aims:
summary(res))An interactive, detailed report of the DGE analysis is available here.
These files have already been linked through this report, but are collected here for convenience.